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Mirtrons are a special type of pre-miRNA which originate from intronic regions and are spliced 
directly from the transcript instead of being processed by Drosha. The splicing mechanism is better 
understood for the processing of mRNA for which was established that there is a characteristic CG 
content around splice sites. Here we analyse the CG-content ratio of pre-miRNAs and mirtrons and 
compare them with their genomic neighbourhood in an attempt to establish key properties which 
are easy to evaluate and to understand their biogenesis. We propose a simple log-ratio of the CG- 
content comparing the precursor sequence and is flanking region. We discovered that Caenorhabditis 
elegans and Drosophila melanogaster mirtrons, so far without exception, have smaller CG-content 
than their genomic neighbourhood. This is markedly different from usual pre-miRNAs which mostly 
have larger CG-content when compared to their genomic neighbourhood. We also analysed some 
mammalian and primate mirtrons which, in contrast the invertebrate mirtrons, have higher CG- 
content ratio. 



INTRODUCTION 

During the last decade, a wealth of small RNAs were 
discovered and with them new classes of biological regu- 
lators emerged. Among those, microRNAs (or miRNAs) 
due to their crucial role in genomic regulation are per- 
haps the most intensively studied. miRNAs are involved 
in the regulation of numerous cellular processes including 
differentiation, development, apoptosis, proliferation, the 
stress response and they change the expression of genes 
in several human diseases such as diabetes, cancer and 
neuromuscular dystrophy (U-H)]. 

miRNAs are non-coding RNAs first identified in 1993 
in the nematode Caenorhabditis elegans Q. Canon- 
ical miRNAs are derived from primary miRNA tran- 
scripts (pri- miRNA), usually long nucleotide sequences 
that form specific hairpin-shaped stem-loop secondary 
structures. Pri-miRNA may originate one or more 
hairpins typically with 55-70 nucleotide (nt) in length. 
In animals, pri-miRNAs are cleaved by the nuclear 
Drosha RNase III enzyme to release precursor miRNA 
(pre-miRNA) hairpins. These are then transported to 
the cytoplasm by Exportin-5 (Exp5) and cleaved by 
the Dicer RNase III enzyme to generate a very short 
miRNA/miRNA* duplex [5]. One of the strands, called 
mature miRNA (22— 25nt), is incorporated into a RISC 
complex (RNA induced silencing complex) and guides the 
complex to the target mRNA to regulate gene expression 
while the other strand seems to take on other biological 
functions 0-0|- ^ n animals, most of the miRNA functions 
are related to down-regulation of genes. 

Ruby et al. @ showed the existence of intronic pre- 
miRNAs in Drosophila melanogaster and C. elegans that 
bypass Drosha processing providing an alternative path- 
way for miRNA biogenesis @. These pre-miRNAs were 



called 'mirtrons' and the main difference between them 
and canonical miRNAs is that intronic sequences form 
lariats and the mirtrons are originated by splicing (8hTo| . 



Flynt et al. [ll[ reclassified a subset of mirtrons in D. 
melanogaster as "tailed mirtrons" , which have substan- 
tial 3' overhangs and are targets of exosome-mediated 
3' — 5' trimming, which allows functional pre-miRNA to 
be generated. The existence of mirtrons in mammalians 
(human, macaque, chimpanzee, rat and/or mouse) was 
reported by Berezikov et al. [lfj where they identified, 
using computational and experimental strategies, 3 well 
conserved mirtrons expressed in diverse mammals, 16 pri- 
mate specific mirtrons, and 46 candidate mirtrons in pri- 
mates. 

For mRNA, which is processed by splicing, Zhang et 
al. [HJ determined that there is a characteristic CG con- 
tent around splice sites. Also, it was shown that alter- 
native splicing is promoted by the secondary RNA struc- 
ture [lH which is strongly determined by CG content [lij . 
MicroRNAs are co-expressed with mRNAs [15J, [l6| and, 
in particular, mirtrons are seemingly not processed by the 
Drosha microprocessor but by splicing only. With splic- 
ing being dependent on thermodynamic stability could 
there be some characteristic CG content which would set 
aside mirtrons from ordinary pre-miRNAs? Of special 
interest would be properties which would help to under- 
stand the splicing mechanism proposed for mirtrons [9|. 

Here we set out to characterise precursor sequences 
of miRNAs and mirtrons in terms of CG-content and 
also Gibbs free energies for D. melanogaster and C. ele- 
gans. We found that the CG-content shows marked dif- 
ferences for both types of small RNA. Also, we performed 
the same analysis for mammalian mirtrons reported by 
Berezikov et al. [lol ]. again our results show important 
differences albeit opposite of those for the two inverte- 
brates. 
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METHODS 

To characterise the small RNAs we compare the CG- 
content of the precursor sequences which originate the 
pre-miRNAs and mirtrons to the CG-content of their 
neighbouring regions. The rationale for this approach is 
that if the neighbouring DNA sequence has an important 
difference in thermodynamic stability, when compared to 
the precursor sequence there should be tell tale signs of it 
in the CG-content fractions. We define the CG-content 
fraction as 



genome file of D. melanogaster version r5.34 poj and ver- 
sion WS223 for C. elegans [21(. We retrieved the precur 



/ = 



number of C and G nucleotides 
total number of nucleotides 



(1) 



Two types of CG-content fractions are used, one fp is 
related to either the precursor miRNA or the precur- 
sor mirtron. The other /jv accounts for the total CG- 
contents of the 150 base pairs downstream and upstream 
of the precursor sequence which forms the neighbourhood 
of the precursor. The flanking sequence length was cho- 
sen to be of the same order of magnitude of the length of 
canonical pre-miRNAs. We perfomed the same analysis 
with longer flanking sequences (up to 250 nt, not pre- 
sented) , but found no difference from the results reported 
in this work. Both CG-contents are combined to form a 
log-ratio between the precursor and its neighbourhood 



sor miRNA/mirtron sequences by searching for an exact 
match within the complete genome files. For each se- 
quence four types of matches were performed: the origi- 
nal sequence, the reversed sequence, the complementary 
sequence and the reversed-complementary sequence. 

The mirtrons reported by Berezikov et al. [10j were 
collected from the supplemental data, the neighbouring 
sequences for each these mirtrons were obtained from En- 
semble API and databases (22J. 

To complete our analysis we also calculated the average 
Gibbs free energies of mirtrons and ordinary pre-miRNA. 
In this work we use the RNAfold program from the Vi- 
enna package [23| with default parameters to obtain the 
Gibbs free energies, AG. 



RESULTS AND DISCUSSION 
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A positive ratio means that the CG-content fp of the 
precursor sequence is larger than that of its neighbours. 
Since CG-content is related to thermodynamic stability 
we may infer that R > generally means that the flank- 
ing DNA region is less stable that the precursor region. 
To ease the notation we use 

R + — >• R > precursor region has larger CG-content, fp > /jv 
R~ —> R < flanking region has larger CG-content, /at > fp 
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To evaluate the statistical significance of our findings 
we use the Kolmogorov-Smirnov test [l7j . Even though 
this statistical test is well established, given the question 
which is posed in this work it is perhaps more intuitive 
and simpler to quantify the significance by using simple 
combinatorial probabilities. Therefore, we also calculate 
the probability pT of drawing k pre-miRNAs, all with 
R~ , purely by chance 



FIG. 1. CG-content ratio R distribution for a) 151 canonical 
miRNAs and b) 18 mirtrons of D. melanogaster and c) 170 
canonical miRNAs and d) 5 mirtrons of C. elegans. 



P = 



(3) 



where n~ and n + are number of known pre-miRNAs with 
R~ and R + , respectively. 

The database used to obtain the precursor miRNA 
and mirtrons of D. melanogaster and C. elegans was 
from mirBASE version 16 [TH, [l9| . which is one of the 
main on-line repositories for microRNA sequences. For 
extracting the flanking sequences we used the complete 



In Fig. [T]we show the distribution of CG-content log- 
ratio R for canonical pre-miRNAs and mirtrons, defined 
in Methods, of D. melanogaster and C. elegans. The 
content log-ratio R for canonical pre-miRNAs, Fig. 
is roughly gaussian with a peak around R = 0. This 
means that for this type pre-miRNA there appears to 
be no strong preferential ratio for CG-content within the 
precursor sequence and its neighbours, although a bias 
towards R + is clearly noticeable. In stark contrast, all 
18 mirtrons of D. melanogaster have R < (R~) as 
shown in Fig. QJ>. Even though the number of reported 
mirtrons is still small, the probability of picking 18 small 
RNAs with R~ by chance alone, considering the distri- 
bution for canonical pre-miRNA, is p~ = 9.1 x 10 -9 , see 
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TABLE I. CG-content ration R and free energy AG characteristics and of canonical pre-miRNAs and mirtrons 
of invertebrates. Also shown are number of sequences with R ± , average CG-content ratio (R), average free 
energy (AG) and average precursor length (N). 



Organism 


RNA type 


total 


n+ 


n 


n+ : 


n 


{R) (AG) (kcal/mol) 


(AT) (nt) 


C. elegans 


mirtrons 


5 





5 






0.81 ±0.08 -20.28 ±4.21 


62.22 ±6.72 


D. melanogaster 


mirtrons 


18 





18 






0.76 ±0.10 -22.03 ±6.55 


69.05 ± 16.68 


D. melanogaster 


tailed mirtrons 


7 


2 


5 


1 : 


1.8 


0.84 ±0.24 -20.36 ±10.19 


93.00 ±42.24 


C. elegans 


canonical miRNAs a 


170 


131 


39 


3.3 


: 1 


1.18 ± 0.23 -35.10 ±9.09 


91.78 ± 14.26 


D. melanogaster canonical miRNAs^ 


151 


97 


54 


1.8 


: 1 


1.08 ±0.21 -33.89 ±8.25 


94.00 ± 18.50 



0mirtrons excluded. 



TABLE II. CG-content ratio R and free energy AG characteristics and of canonical miRNAs considering only 
those with R~ . 



Organism 


RNA type 


n 


(R~) (AG) (kcal/mol) 


(N) (nt) 


C. elegans 


canonical miRNAs 


39 


0.90 ±0.08 -30.17 ±8.72 


90.95 ± 17.50 


D. melanogaster 


canonical miRNAs 


54 


0.87 ±0.10 -30.0 ±5.40 


90.40 ±11.78 



Eq. The Kolmogorov-Smirnov distibution test yields 
p~ = 7.8 x 10 -9 which essentially confirms the simple 
combinatorial probability. Therefore, the occurrence of 
18 R~ pre-miRNA entirely by chance is very unlikely. 

Some authors describe mirtrons as tightly packed be- 
tween exons [H, but in our analysis we have found that 
this is not the case. Most mirtrons are surrounded by in- 
tronic sequences not exons. This seems consistent if one 
considers that intronic regions of D. melanogaster are 
about 750 to 1000 nt in length on average [2J] and that 
mirtrons are typically 60 nt in length. Therefore, R~ 
means that the immediate flanking region which is also 
intronic is more stable than the precursor region. One 
possible explanation for the predominance of R~ would 
be if intronic regions were of highest CG-content. How- 
ever, the intronic regions of D. melanogaster have one of 
the smallest CG-content in this genome: 0.4 as compared 
to 0.52 for coding regions. The fact that the surround- 
ing region of mirtrons has a higher CG content, which 
is unusual for intronic regions, suggest a role in the pro- 
cessing of these special types of miRNAs. Therefore, we 
may speculate that R" may play a role in the mirtron 
splicing mechanism in a similar fashion to what happens 
for messenger RNA [H}. 

For canonical C. elegans pre-miRNAs we observe 
a similar gaussian shaped distribution of the ratio R 
(Fig. [TJ;) but with strong bias towards R + . Tab. fl] shows 
that the n + : n~ ratio is of three R + pre-miRNAs for 
each R~ pre-miRNA. To date there are only five mirtrons 
reported and they all show R~ (Fig. [TJi), similar to the 
mirtrons of D. melanogaster. Even though this number 
is very small, it is still intriguing given the strong bias 
toward R + in canonical pre-miRNAs. Indeed, the prob- 
ability of picking 5 pre-miRNAs all with R~ is small, 
the combinatorial probability being p~ = 6.3 x 10~ 4 . 
Again, the Kolmogorov-Smirnov test provides 5.5 x 10~ 4 
in agreement with the combinatorial probability. 

To complete our comparative analysis of mirtrons and 
canonical pre-miRNAs, we also calculated the Gibbs free 



energies. Clearly, given the R~ nature of the mirtons, one 
would expect these to be generally less stable than the av- 
erage pre-miRNAs. In Fig. [5] we show the distribution of 
free energy AG for both invertebrates, and detailed quan- 
tities are also given in Tab. [I] Except for one notable ex- 
ception, all mitrons show AG larger than —30 kcal/mol, 
confirming their instability. In contrast, canonical pre- 
miRNAs are distributed over a much larger range of en- 
ergies. Certainly, the fact that mirtrons are much shorter 
than canonical pre-miRNAs, 60 nt compared to 90 nt on 
average, largely accounts for this. But is a free energy 
larger than —30 kcal/mol sufficient to result in To 
answer this, we isolated all canonical pre-miRNAs with 
R~ and recalculated the their AG distribution, which are 
shown as red bars in Figs. [5^ and[2t and summarised in 
Tab. [Til Essentially, we find a considerable number of R + 
pre-miRNAs with AG > —30 kcal/mol. In other words, 
a pre-miRNAs with AG > —30 kcal/mol does not imply 
in R~ . Therefore, the free energy distribution alone does 
not explain why all mirtrons of D. melanogaster and C. 
elegans are R~ . 



The next question is whether other types of reported 
mirtrons, such as primate and mammalian mirtrons 
show the same R distribution as D. melanogaster and 
C. elegans? As shown in Tab. Hill in terms of CG- 
content ration R and average free energy AG these 
mirtrons appear not to be biased to any particular value. 
Berezikov et al. [lfj| found that the GC content of mam- 
malian mirtrons was much higher than that of inver- 
tebrate miRNAs but, in comparison with their neigh- 
bours regions, we found that they tend generally to R + 
(precursor region has larger CG-content), see Fig. [3] We 
have not attempted to generate the distribution of mam- 
malian pre-miRNAs due to the number of large genomes 
which would have to be processed. 
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TABLE III. CG-content ratio R and free energy AG characteristics of specific vertebrate mirtrons. 



Organism RNA type total n + n~ n : n~ 

mammalians putative mirtrons 13 8 5 1.5 : 1 

primates specific mirtrons 16 13 3 4.3 : 1 

primates candidate mirtrons 45 b 40 5 8:1 



(R) AG (kcal/mol) (N) (nt) 

0.98 ± 0.16 -30.02 ± 13.20 87.92 ±9.52 

1.12 ± 0.11 -30.32 ± 11.69 83.62 ± 24.36 

1.11 ±0.09 -41.18 ± 12.39 90.44 ±21.94 



reports 46 candidates mirtrons for primates, yet supplementary tables only show 45 sequences. 



40 



3 20 - 



D. melanogaster 



C. elegans 



(a) 




J 


L 



(b) 



(c) 




..■.II 


k 


- (d) 






Ji 



-80 



-60 -40 

A G (kcal/mol) 



20 -80 



-60 -40 -20 
A G (kcal/mol) 



FIG. 2. Average free energy AG distribution for a) 151 
canonical pre-miRNAs and b) 18 mirtrons of D. melanogaster. 
and c) 170 canonical miRNAs and d) 5 mirtrons of C. elegans. 
Red bars are for R~~ miRNAs. 



CONCLUSIONS 



We have introduced the concept of CG-content log- 
ratio of precursor sequences and flanking regions and dis- 
covered that all D. melanogaster and C. elegans mirtrons 
are R~ . This cannot be explained by the CG-content of 
the intronic region and neither by the fact that mirtrons 
are generally shorter and less stable than pre-miRNAs. 
Usual pre-miRNAs of these organisms only show a mod- 
erate bias towards R + . This finding appears to support 
the notion that mirtrons are spliced in a similar fash- 
ion to mRNA instead of being processed by Drosha. For 
mammalian mirtrons we have found no such bias, but we 
noticed that these also display several important differ- 
ences when compared to the vertebrate mirtrons which 
were considered in this work, such as differences in length 
and free energy. 



(a) mammalians (b) primate specific (c) primate candidate 
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